1,179 research outputs found
Challenges for Chemoinformatics Education in Drug Discovery
Surveys the curriculum developed at Indiana University for teaching cheminformatics in the IU School of Informatic
Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data
Improving integrative searching of systems chemical biology data using semantic annotation
<p>Abstract</p> <p>Background</p> <p>Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued.</p> <p>Results</p> <p>We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology.</p> <p>Availability</p> <p>Chem2Bio2OWL is available at <url>http://chem2bio2rdf.org/owl</url>. The document is available at <url>http://chem2bio2owl.wikispaces.com</url>.</p
Variable Selection and Model Averaging in Semiparametric Overdispersed Generalized Linear Models
We express the mean and variance terms in a double exponential regression
model as additive functions of the predictors and use Bayesian variable
selection to determine which predictors enter the model, and whether they enter
linearly or flexibly. When the variance term is null we obtain a generalized
additive model, which becomes a generalized linear model if the predictors
enter the mean linearly. The model is estimated using Markov chain Monte Carlo
simulation and the methodology is illustrated using real and simulated data
sets.Comment: 8 graphs 35 page
From Starburst to Quiescence: Testing AGN feedback in Rapidly Quenching Post-Starburst Galaxies
Post-starbursts are galaxies in transition from the blue cloud to the red
sequence. Although they are rare today, integrated over time they may be an
important pathway to the red sequence. This work uses SDSS, GALEX, and WISE
observations to identify the evolutionary sequence from starbursts to fully
quenched post-starbursts in the narrow mass range , and identifies "transiting" post-starbursts which are intermediate
between these two populations. In this mass range, of galaxies are
starbursts, are quenched post-starbursts, and are the
transiting types in between. The transiting post-starbursts have stellar
properties that are predicted for fast-quenching starbursts and morphological
characteristics that are already typical of early-type galaxies. The AGN
fraction, as estimated from optical line ratios, of these post-starbursts is
about 3 times higher () than that of normal star-forming
galaxies of the same mass, but there is a significant delay between the
starburst phase and the peak of nuclear optical AGN activity (median age
difference of Myr), in agreement with previous studies.
The time delay is inferred by comparing the broad-band near NUV-to-optical
photometry with stellar population synthesis models. We also find that
starbursts and post-starbursts are significantly more dust-obscured than normal
star-forming galaxies in the same mass range. About of the starbursts
and of the transiting post-starbursts can be classified as the
"Dust-Obscured Galaxies" (DOGs), while only of normal galaxies are
DOGs.The time delay between the starburst phase and AGN activity suggests that
AGN do not play a primary role in the original quenching of starbursts but may
be responsible for quenching later low-level star formation during the
post-starburst phase.Comment: 30 pages, 18 figures,accepted to Ap
PubChemSR: A search and retrieval tool for PubChem
<p>Abstract</p> <p>Background</p> <p>Recent years have seen an explosion in the amount of publicly available chemical and related biological information. A significant step has been the emergence of PubChem, which contains property information for millions of chemical structures, and acts as a repository of compounds and bioassay screening data for the NIH Roadmap. There is a strong need for tools designed for scientists that permit easy download and use of these data. We present one such tool, PubChemSR.</p> <p>Implementation</p> <p>PubChemSR (Search and Retrieve) is a freely available desktop application written for Windows using Microsoft <it>.NET </it>that is designed to assist scientists in search, retrieval and organization of chemical and biological data from the PubChem database. It employs SOAP web services made available by NCBI for extraction of information from PubChem.</p> <p>Results and Discussion</p> <p>The program supports a wide range of searching techniques, including queries based on assay or compound keywords and chemical substructures. Results can be examined individually or downloaded and exported in batch for use in other programs such as Microsoft Excel. We believe that PubChemSR makes it straightforward for researchers to utilize the chemical, biological and screening data available in PubChem. We present several examples of how it can be used.</p
The identification of post-starburst galaxies at z∼1 using multiwavelength photometry: a spectroscopic verification
Despite decades of study, we still do not fully understand why some massive galaxies abruptly switch off their star formation in the early Universe, and what causes their rapid transition to the red sequence. Post-starburst galaxies provide a rare opportunity to study this transition phase, but few have currently been spectroscopically identified at high redshift (z > 1). In this paper, we present the spectroscopic verification of a new photometric technique to identify post-starbursts in high-redshift surveys. The method classifies the broad-band optical–nearinfrared spectral energy distributions (SEDs) of galaxies using three spectral shape parameters (supercolours), derived from a principal component analysis of model SEDs. When applied to the multiwavelength photometric data in the UKIDSS Ultra Deep Survey, this technique identified over 900 candidate post-starbursts at redshifts 0.5 5 angstrem) and Balmer break, characteristic of post-starburst galaxies.We conclude that photometric methods can be used to select large samples of recently-quenched galaxies in the distant Universe
Far Infrared and Submillimeter Emission from Galactic and Extragalactic Photo-Dissociation Regions
Photodissociation Region (PDR) models are computed over a wide range of
physical conditions, from those appropriate to giant molecular clouds
illuminated by the interstellar radiation field to the conditions experienced
by circumstellar disks very close to hot massive stars. These models use the
most up-to-date values of atomic and molecular data, the most current chemical
rate coefficients, and the newest grain photoelectric heating rates which
include treatments of small grains and large molecules. In addition, we examine
the effects of metallicity and cloud extinction on the predicted line
intensities. Results are presented for PDR models with densities over the range
n=10^1-10^7 cm^-3 and for incident far-ultraviolet radiation fields over the
range G_0=10^-0.5-10^6.5, for metallicities Z=1 and 0.1 times the local
Galactic value, and for a range of PDR cloud sizes. We present line strength
and/or line ratio plots for a variety of useful PDR diagnostics: [C II] 158
micron, [O I] 63 and 145 micron, [C I] 370 and 609 micron, CO J=1-0, J=2-1,
J=3-2, J=6-5 and J=15-14, as well as the strength of the far-infrared
continuum. These plots will be useful for the interpretation of Galactic and
extragalactic far infrared and submillimeter spectra observable with ISO,
SOFIA, SWAS, FIRST and other orbital and suborbital platforms. As examples, we
apply our results to ISO and ground based observations of M82, NGC 278, and the
Large Magellenic Cloud.Comment: 54 pages, 20 figures, accepted for publication in The Astrophysical
Journa
- …